feat: Add TimestampNTZType support for casts and unix_timestamp#3253
feat: Add TimestampNTZType support for casts and unix_timestamp#3253andygrove wants to merge 19 commits intoapache:mainfrom
Conversation
The csv_scan.rs file uses types from datafusion_datasource but the dependency was not declared in native/core/Cargo.toml. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive support for TimestampNTZType (Timestamp without timezone) wherever TimestampType is currently supported. Changes: - Add TimestampNTZType to CometCast.supportedTypes - Support casting TimestampNTZ to Long, String, Date, and Timestamp - Add TimestampNTZ support to unix_timestamp function (no timezone conversion) - Add tests for TimestampNTZ casts and temporal expressions TimestampNTZ stores local time without timezone context, so: - unix_timestamp simply divides microseconds by 1,000,000 - Casts to Date use simple truncation (no timezone adjustment) - Casts to String format as local datetime without timezone suffix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3253 +/- ##
============================================
+ Coverage 56.12% 59.90% +3.77%
- Complexity 976 1473 +497
============================================
Files 119 175 +56
Lines 11743 16172 +4429
Branches 2251 2686 +435
============================================
+ Hits 6591 9688 +3097
- Misses 4012 5131 +1119
- Partials 1140 1353 +213 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This commit fixes an issue where date_trunc on timestamp_ntz values could produce incorrect results when the truncation crosses DST boundaries (e.g., truncating a December date to October). The fix modifies as_micros_from_unix_epoch_utc to re-interpret the local datetime in the timezone after truncation, ensuring the correct DST offset is used for the target date. Also updates the test to use a reasonable date range (around year 2024) since chrono-tz has limited support for DST calculations with far-future dates (beyond approximately year 2100). Adds documentation about this known limitation to the compatibility guide. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…test - Remove cast tests for TimestampNTZType to LongType (Spark doesn't support) - Remove cast tests for numeric types and BinaryType to TimestampNTZType (Spark doesn't support these casts) - Fix date_format timestamp_ntz test by using UTC timezone explicitly (Comet interprets timestamp_ntz as UTC during cast, which differs from Spark's session timezone behavior for non-UTC timezones) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…pport # Conflicts: # docs/source/user-guide/latest/compatibility.md
|
What is the expected behavior when data is written to a timestamp ntz field with one session timezone and read by another user with a different session timezone? |
|
I am moving this to draft until the DataFusion 52 upgrade is merged. |
Skip timezone conversion for TimestampNTZ arrays in timestamp_trunc. NTZ values are timezone-independent, so truncation operates directly on naive microsecond values without any timezone resolution, avoiding panics on ambiguous DST fall-back times (e.g. 2024-11-03T01:30:00 in US/Eastern). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thanks. I added these tests and that did expose a bug, which is now fixed. |
|
Merged latest from The remaining CI failure ( |
Three tests added by this PR assert full native execution via checkSparkAnswerAndOperator, but the underlying expressions legitimately fall back to Spark under the PR's declared scope (casts and unix_timestamp only): - hour/minute/second on TimestampNTZ are marked Incompatible (apache#3180) - date_trunc falls back for non-UTC session timezones (apache#2649) because Catalyst wraps the NTZ child in cast(ts_ntz as timestamp) Use checkSparkAnswer for the fallback paths and retain checkSparkAnswerAndOperator where Comet runs natively (cast, unix_timestamp, date_trunc in UTC).
|
@andygrove PTAL at #4039. It is based on this PR, squashed, rebased on main, and with the cast changes removed. (I included one cast test from this PR which was not covered by the ones already there). |
|
replaced by #4039 |
Summary
Add comprehensive support for
TimestampNTZType(Timestamp without timezone) whereverTimestampTypeis currently supported.Key changes:
TimestampNTZTypetoCometCast.supportedTypesunix_timestampfunction (no timezone conversion needed)Semantic differences from TimestampType:
unix_timestamp(TimestampNTZ): Simply divides microseconds by 1,000,000 (no timezone adjustment)Test plan
./mvnw test -DwildcardSuites="CometCast"./mvnw test -DwildcardSuites="CometTemporalExpression"🤖 Generated with Claude Code